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(54) Pipeline processing method and device for instruction execution 



(57) A second execution unit such as a coprocessor 
incorporated into a processor is connected such that the 
direction of its processing flow is opposite to that of the 
nnain pipeline processing flow, and executes high-speed 
multiplication operations and specific operations. Con- 
ventionally, the second execution unit has been provid- 
ed in the same direction as a first execution unit. With 
this prior art arrangement, the second execution unit is 
initiated at an early stage of pipeline processing. With 
the arrangement of this invention, the second execution 
unit is initiated at a later stage, giving sufficient time be- 
fore all the operation data are prepared. Thus, it is un- 
necessary for the apparatus to start subsequent 
processing until all operation data become available, 
thereby enhancing processing performance. 
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Description 

BACKGROUND OF THE INVENTION 

Field of the Invention: 

The present invention relates to a nnethod for exe- 
cuting instructions through pipeline processing and to a 
device therefor. In this method, a subpath is provided in 
the manner of detouring around a part of a nriain path 
for pipeline processing and assunnes parts of the proc- 
esses in the main path. This invention has possible ap- 
plications in microprocessors having a coprocessor for 
example. 

Description of the Prior Art: 

A single-chip RISC (Reduced Instruction Set Com- 
puter) is a device for simultaneously realizing high 
processing performance, low pov\^er consumption, and 
a small mounting area, primarily in specific applications 
including image processing. Recently, a dedicated arith- 
metic logic circuit has often been provided in this type 
of microprocessor to further enhance the arithmetic per- 
formance. 

One example of this type of microprocessor is the 
V851 produced by NEC, Ltd. According to NEC Tech- 
nology Report (Vol. 4o, No. 3/'1995, pages 42-47), the 
V851 adopts a pipeline RISC architecture that includes, 
in addition to an ordinary ALU, a hardware multiplier unit 
called an MULU for high speed execution of multiplica- 
tion instructions. 

Fig. 1 shows the internal organization of the V851. 
This diagram is made based on the schematic diagram 
in page 45 of the above document. As shown in the di- 
agram, this microprocessor comprises an instruction 
memory 100 for storing instructions to be executed; an 
instruction fetch unit 101 for sequentially reading in- 
structions; an instruction decoder 1 02 for decoding read 
instructions; general-purpose register group 103 to be 
accessed'based on a general register number identified 
from a decoded result; a first execution unit 106 for re- 
ceiving one or two source operand read from a general- 
purpose register via buses 114 and 115 and executing 
general operations (hereinafter "general execution") ac- 
cording to the decoded result; a memory access unit 108 
for reading data necessary for a process from data 
memory 1 07 according to the result of the operation ex- 
ecution or writing data of a processed result into data 
memory 107; and a general-purpose register write unit 
109 for receiving data read from data memory 107 via 
bus 116 and writing them into a predetermined register 
in general-purpose register group 103. 

The above units together constitute a main path for 
pipeline processing. Fig. 1 also shows a bus 112 leading 
from memory access unit 108 to the input side of first 
execution unit 1 06. and a bus 1 1 3 leading from the out- 
put side of first execution unit 1 06 to the input side there- 



of- These buses 112 and 113 are necessary for achiev- 
ing "a data forwarding" (described later). 

In addition to the above, another path (hereinafter 
"subpath") is provided, detouring around a part of the 

5 main path. On this path, a second execution unit 110 
(corresponding to MULU) is provided. A second execu- 
tion unit 110, which is dedicated to multiplication oper- 
ations, assumes the processes to be executed by the 
first execution unit 106 when a multiplication instruction 

10 is decoded. Pipeline processing flows from top to bottom 
along both of the main path and the subpath in Fig. 1. 

Pipeline processing is executed through process . 
units each called a stage. Each stage is processed in a 
constant time period determined according to an oper- 

IS ation clock of a microprocessor. Many microprocessors, 
including the V851, execute instructions by dividing 
them into five stages outlined below. Each stage is 
named as follows, but this nomenclature is on ly for con- 
venience. 

20 

1 . I stage 

Instructions are fetched (read) by instruction fetch 
unit 101 (see Fig. 1). 

25 

2. R siage 

ji-ietr nations are decoded by instruction decoder 
102. A read operation from general-purpose registers is 
30 concurrently executed here. 

3. A stage 

First execution unit 1 06 (ALU) executes general op- 
35 erations. Multiplication operations are executed by sec- 
ond execution unit 110 (MULU). A memory address is 
generated for use in the following stage. 

4. M stage 

40 

Memory access unit 108 accesses data memory 

107. 

5. W stage 

45 

General-purpose register write unit 109 writes data 
which has been read into a general -purpose register. 

For the V851 , each of the above stages is generally 
completed within one cycle (one clock) except for mul- 
so tiptication operations by MULU, which requires two cy- 
cles to complete. 

Fig. 2 shows a state of pipeline processing by a gen- 
eral microprocessor having the organization shown in 
Fig. 1. Four instructions LD. LD, MUL. and ST are nec- 
ss essary to write a product of two data stored in data mem- 
ory 107 back into the memory 1 07. In this diagram, any 
instruction XX follows ST instruction. With the initial in- 
struction 
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LD R1. (RS), data stored at address R8 in data 
memory 107 is read and transferred to register R1 in 
general-purpose register group 103. Similarly, with the 
following instruction, data stored at address R9 is trans- 
ferred to register R2. With the third MUL instruction, data 
in registers R1 and R2 are multiplied with each other, 
and the result is stored in register R1. Finally, with ST 
instructior>, data in register R1 is stored at address R1 0 
in data memory 107. 

Fig. 2 shows a state where respective instructions 
are executed through five stages. At M stage of the first 
instruction, data is read from address R8 in memory. 
The read data is written into register R1 at the following 
W stage. Similarly, data is read from address R9 in 
memory at M stage of the subsequent instruction and 
written into register R2 at W stage thereof. 

Execution of the third MUL instruction is held at A 
stage. For this, values in registers R1 and R2 must be 
ready by the start of this stage. In general, the value in 
register R2 is not ready until W stage of the second in- 
struction (LD instruction). However, here, this value is 
extracted from M stage, which is an immediately pre- 
ceding stage of W stage, via bus 112 (see Fig. 1), and 
transferred to A stage of MUL instruction as indicated 
by the arrow a, which is a data forwarding method. In 
this example, this arrangement is effective in making the 
A stage of MUL instruction start earlier by one cycle. 
(Note that forwarding with bus 113 is not related to this 
invention.) 

MULU initiates execution of a multiplication opera- 
tion at the start of the MUL instruction A stage, and com- 
pletes it within two clock cycles by the end of M stage. 
For the last ST instruction, the data forwarding method 
is also applied (indicated with the arrow b) and data on 
the result of the multiplication operation is thereby trans- 
ferred from the end of M stage of MUL instruction to the 
beginning of A stage of ST instruction. Subsequently, at 
M stage, that data is stored at address RIO In memory. 

In Fig, 2, A stage of MUL instruction starts after 
completion of M stage of its immediately preceding LD 
instruction. Thus, a stage for waiting, denoted with (R), 
is inserted inbetween. Because of this insertion, the fol- 
lowing ST instruction cannot progress to its R stage, so 
that a stage for waiting, denoted with (I), is inserted into 
the execution for ST instruction. Further, the following A 
stage of ST instruction must wait until completion of M 
stage of MUL instruction. This requires another (R) 
stage to be inserted for waiting. This in turn demands 
another (I) stage to be inserted for waiting into the exe- 
cution for XX instruction, ff a process period for one mul- 
tiplication operation is defined as from the beginning of 
W stage of the initial LD instruction to the beginning of 
W stage of XX instruction, this microprocessor requires 
six cycles to complete one multiplication operation. Al- 
though there may be other ways to define a process pe- 
riod, the aforementioned definition Is natural when the 
following periods after XX instruction will be similarly de- 
fined. This is because a process period headed by XX 



4 

instruction can be counted, beginning with W stage 
thereof. 

A process for writing a product of two data stored in 
a memory back into the memory is frequently applied 

5 over general signal processing, including image 
processing, etc. One of the main objects of a RISC mi- 
croprocessor is to achieve utmost performance iri some 
specific usages. In a macroscopic point of view, six cy- 
cles are consumed for executing four instructions in Fig. 

10 2. Of those six cycles, however, one cycle is only for 
waiting. This waiting cycle Is only necessary because 
one multiplication operation takes two cycles to com- 
plete. Thus, in theory, these six cycles can be reduced 
to five cycles. 

is One extra cycle is still needed because the MUL 
instruction A stage must wait until completion of the M 
stage of the immediately preceding LD instruction. A 
programming method has been known in order to solve 
this problem. In this method, two or more multiplication 

20 operations are arranged to be grouped together. For in- 
stance, if an operation requires processes for writing a 
product of values stored at addresses RIO and R11 of 
a memory back into the memory, in addition to the proc- 
esses shown in Fig. 2, all LD instructions are coded prior 

25 to other instructions as follows: 



LD 


R1, (R8) 


LD 


R2, (R9) 


LD 


R3, (RIO) 


LD 


R4, (R11) 


MUL 


R1, R2 


MUL 


R3, R4. 



3S While the third and fourth LD instructions are exe- 
cuted, a wait stage for the Initial MUL instruction be- 
comes unnecessary. Similarly, while the initial MUL in- 
struction is executed, a wait stage for the next MUL in- 
struction also becomes unnecessary. However, this 

40 method relies on programming and is not usable in cas- 
es when an operation includes only one multiplication, 
such as is the case shown in Fig. 2. Thus, not only is 
such programming troublesome, but Improvement 
through such programming is subject to a limit. In actu- 

45 aiity, much improvement cannot be expected through 
programming. 

SUMMARY OF THE INVENTION 

50 The present invention has been conceived to over- 
come the above problems and aims to provide an in- 
struction execution method which eliminates wart stag- 
es and an apparatus therefor by reorganizing a hard- 
ware structure. This invention provides a method and 

55 an apparatus which can constantly achieve improved 
performance Irrespective of processing context, and re- 
quires no special programming techniques. 
(1 ) In a first aspect of the present invention, there is pro- 
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vided a method for executing instructions through pipe- 
line processing, wherein, in addition to a main path for 
pipeline processing, a subpath is provided in a manner 
of detouring around a part of the main path for assuming 
a portion of the main path function or processes, the 
subpath is connected such that processing therein flows 
in an opposite direction to that in the main path, and a 
process is entrusted from the main path to the subpath 
in a later stage of the pipeline processing executed in 
the main path than a processed result is received by the 
main path from a subpath. 

The "main path" signifies a path formed through the 
cooperation of various processing units for the basic 
flow of the pipeline processing. In a conventional in- 
struction execution method, the processing flows in the 
same direction at a branch from a main path toa subpath 
as the.direction of the processing in the main path. Thus, 
a processed result obtained in the subpath joins the 
main path downstream. In this invention, on the other 
hand, the processing flows in the opposite direction in 
the subpath. That is, entrustment of a process from the 
main path to a subpath is done downstream of the re- 
ceipt of a processed result by the main path from a sub- 
path along the pipeline. As a result, initiation of the 
processing in the subpath can be delayed so that pipe- 
line processing in the main path can correspondingly 
progress. 

(2) in a second aspect of the present iiTvention, there is 
provided an apparatus for executing instructions 
through pipeline processing by dividing them into a plu- 
rality of stages comprising a plurality of stage process 
sections each responsible for processing at each stage, 
constituting a main path for pipeline processing; and an 
execution section provided on a subpath for executing 
a predetermined operation through processing which 
flows in an opposite direction to that in the main path. 

The term "operation" herein is not limited to numer- 
ical operations, and signifies in general a unit process 
of various control functions. An example of the "opera- 
tion execution unit" is a coprocessor which executes 
specific operations. A general operation execution unit 
may be provided on a main path, however, in such a 
case: it is different from the "operation execution unit" 
mentioned herein. 

In this aspect, each stage process section on the 
main path executes instructions through pipeline 
processing. The operation execution unit on a subpath 
executes a predetermined operation through process- 
ing which flows in the opposite direction to that in the 
main path. 

(3) In a third aspect, an apparatus for executing instruc- 
tions comprises a processor for executing instructions 
through pipeline processing and a coprocessor for as- 
suming a predetermined operation among processes by 
the processor, wherein the coprocessor is connected to 
the processor in an opposite direction such that 
processing by the coprocessor flows in an opposite di- 
rection to that by the processor. 



In this aspect, entrustment of an operation, namely, 
transfer of the content of and data for an operation, from 
a processor to a coprocessor is done downstream of the 
point where an operation result is transferred from the 

5 coprocessor to the processor. As a result, the start of 
the processing by the coprocessor can be delayed so 
that pipeline processing by the processor can corre- 
spondingly progress. In addition, an operation result 
yielded by the coprocessor may be directly given to a 

10 memory stage which needs the result. . 

(4) In a fourth aspect, in an apparatus according to a 
third aspect, the pipeline processing by the processor 
includes a memory access stage as one of processing 
stages, the processor comprises a memory access sec- 

is tion for controlling the memory access stage, and a data 
input unit of the coprocessor and a data output unit 
thereof are connected to a data output unit of the mem- 
ory access section and a data input unit thereof, respec- 
tively. 

20 In this aspect, a coprocessor can be activated after 
receiving the result of an access made by the memory 
access section. In addition, the operation result yielded 
by the coprocessor may be directly utilized by the mem- 
ory access section. 

25 (5) In a fifth aspect, in an apparatus according to the 
fourth aspect, tne processor comprises an insLi u(.;uu! i 
fetch section, an instruction decode section, a general 
execution section, a m.em.ory access section, and a reg- 
ister write section, each responsible for a corresponding 

30 stage of pipeline processing, and a data input unit of the 
coprocessor and a data output unit thereof are connect- 
ed to a data output unit of the memory access section 
and a data input unit thereof, respectively, via dedicated 
buses, 

35 In this aspect, since the coprocessor can be acti- 
vated after receiving the result of an access made by 
the memory access section, the processor does not 
need to activate the coprocessor at a process stage as- 
signed to a general execution section (e.g., A stage in 

40 Fig. 2). Instead, the processor can activate the coproc- 
essor at the following process stage (e.g.. M stage in 
Fig. 2). In addition, since the coprocessor and the mem- 
ory access section are connected with each other via a 
dedicated bus, data can be transmitted between them 

45 neither being hindered by nor hindering other process- 
ing state. 

BRIEF DESCRIPTION OF THE DRAWINGS 

50 The above and other objects, features, and advan- 
tages, will become further apparent from the following 
description of the preferred embodiment taken in con- 
junction with the accompanying drawings wherein: 

55 Fig. 1 is a schematic diagram showing an internal 
organization of a V851 ; 

Fig. 2 shows a state of pipeline processing by a gen- 
eral microprocessor employing the structure shown 
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in Fig. 1 ; 

Fig. 3 is a schematic diagram showing an intemal 
organization of an instruction execution apparatus 
of a preferred embodiment, focusing on relationship 
among process stages; 

Fig. 4 shows in detail the structure of an apparatus 
of the preferred embodiment; 
Fig. 5 is a schematic diagram showing an internal 
organization of a coprocessor of the preferred em- 
bodiment; 

Fig. 6 shows a state of pipeline processing by an 
apparatus of the preferred embodiment; and 
Fig. 7 shows a state of executing the same opera- 
tion as that in Fig. 6 without a MUL instruction. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENT S 

Preferred embodiments of the operation execution 
apparatus of this invention are described here with ref- 
erence to the drawings. The description of this appara- 
tus also clarifies the operation execution method related 
to this Invention. In the embodiment, a pipeline micro- 
processor is considered for the operation execution ap- 
paratus. A section corresponding to the MULU in the 
V851 is implemented by a coprocessor. 

[Configuration] 

Fig. 3 is a schematic diagram showing the internal 
organization of an operation execution apparatus ac- 
cording to this embodiment. This diagram is presented 
focusing on the relationship among processing stages. 
In the diagram, similar or equivalent members to those 
in Fig. 1 are given the same reference numerals and 
their explanations are omitted. 

The vertical path in Fig. 3 leading from the instruc- 
tion fetch unit 101 to the general-purpose register write 
unit 109, which constitutes a main path, is incorporated 
into a processor. A second execution unit 110, or a co- 
processor, is provided in the direction from W stage to 
A stage on a subpath which detours around a part of the 
main path. In this apparatus, the second execution unit 
1 10, as well as a memory access unit 108. Is connected 
via a bus 117 behind the first execution unit 106. or an 
ALU. This structure is essentially different from that 
shown in Fig. 1 in the direction where second execution 
unit 110 is connected. To be more concrete, second ex- 
ecution unit 110 is connected in the direction opposite 
to the flow direction of pipeline processing. That is, the 
data input of second execution unit 110 is connected to 
the data output of memory access unit 108, and the data 
output of second execution unit is connected to the data 
input of memory access unit 108. Dedicated buses are 
used for these connections. 

With this arrangement, second execution unit 110 
can initiate execution of operations upon completion of 
the processing at A stage. This structure makes a good 



contrast to that of Fig. 1 in which execution of operations 
must be initiated at the same time as the start of the 
processing at A stage. 

Fig. 4 is a block diagram showing an apparatus of 
this embodiment. A coprocessor 50 corresponds to sec- 
ond execution unit 110. 

Processor 40 includes an instruction fetch unit 101 
for fetching instructions from instruction memory bus 2 
and instruction decoder 1 02 for receiving and decoding 
instruction words from instruction fetch unit 3. These 
units correspond to stage 1 and the first half of stage R. 

At instruction decoder 4, an Instruction word is de- 
coded according to the type of instruction, and items are 
extracted including a function code 6a indicating the 
function of the operation to be processed, immediate op- 
erand 6b which is a constant operand embedded in the 
instruction word, two source register numbers 6c and 
6d. one destination register number 6e, and so forth. 

Source register numbers 6c and 6d are sent to gen- 
eral-purpose register group 103. In this embodiment, 
the registers in general-purpose register group 7 are 
named RO, R1. etc. The contents of registers corre- 
sponding to source register numbers 6c and 6d are 
fetched from general-purpose register group 7 and sent 
to a first execution unit 106 as source operand 8a and 
8b. Buses to be used for the above correspond to the 
buses 11 4 and 115 of Fig. 3. This represents the second 
half of the stage R. The first execution unit 106 is re- 
sponsible for the processing at A stage. 

On the other hand, function code 6a is sent to a 
pipeline controller 9. As shown In the same figure, the 
pipeline controller 9 monitors the state of the entire ap- 
paratus, controls stage progress, and decides the issu- 
ance timing for the individual instructions. Issuance of 
instructions means here to progress the processing 
from R stage to A stage, and generally to progress to a 
stage which may change the state of hardware. 

When It becomes possible for an instruction to be 
issued, function code 6a is sent as a function code 1 0a 
to a first execution unit 106. At the first execution unit 
106, an operation Is performed according to function 
code 1 0a furnished from pipeline controller 9 using nec- 
essary values from among source operand 8a and 8b 
obtained from the general-purpose registers and imme- 
diate operand 6b. The significance of such an operation 
differs depending on the type of instruction. 

If the instruction is an operation instruction, for ex- 
ample, the operation indicated by the instruction is ex- 
ecuted by the first execution unit 106. The operation re- 
suft Is stored into an operation result holding unit 1 3. An 
operation result 16 is sent to general-purpose register 
group 103 through a general-purpose register write unit 
1 5. At this time, as a register number of the write desig- 
nation, destination register number 6e of the instruction 
is sent out as a destination register number 10b at the 
appropriate timing by pipeline controller 9. General-pur- 
pose register write unit 1 5 performs a write operation to 
the general-purpose register using destination register 
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number 1 0b that was obtained from pipeline controller 
9. This corresponds to stage S. In this case, stage M 
corresponds to NOP where nothing is performed. 

On the other hand, if the instruction being executed 
is a memory access instruction, a calculation of the 
memory address to be accessed is executed at the first 
execution unit 106. A memory address 12 that is ob- 
tained is passed to a data memory access section 1 08. 
Data memory access section 108 performs reading and 
writing of data memory 107 through data memory bus 
IB. This corresponds to stage M. The execution of an 
instruction to write into memory is completed by an ex- 
ecution in data memory access section 14. In this case, 
stage W corresponds to NOP. The execution of an in- 
struction to read from memory completes when data 17 
that was read is written to the general-purpose register 
through general-purpose register write unit 109, namely, 
at the same time when stage W completes. 

Processor 40 further includes an interrupt signal in- 
put unit 27 for inputting an external interrupt signal 28. 
When an interrupt signal is input, pipeline controller 9 is 
notified of the input. 

On the other hand, coprocessor 50 performs data 
transfers with processor 40 through buses, namely, a 
coprocessor input bus 20 including operation result 16 
In processor 40 and data 17 that was read from data 
memory 107 (corresponding to a part of the bus 116 in 
Fig. 3), a coprocessor input control bus 21 including sig- 
nals to control inputs, such as of data, to coprocessor 
50, a coprocessor output control bus 23 including sig- 
nals to control outputs from coprocessor 50, and a co- 
processor output bus 24 including operation result data 
and state signals that were output from the coprocessor 
(corresponding to a part of the bus 117 in Fig. 3). 

Coprocessor input bus 20 and coprocessor input 
control bus 21 are referred at a coprocessor input con- 
troller 30. Coprocessor input controller 30 determines 
into which register in coprocessor 50 data should be in- 
put as well as avoiding an input overflow of data to co- 
processor 50. 

A coprocessor execution unit 31 performs an oper- 
ation when coprocessor input controller 30 accepts a 
data input. The operation result is supplied to a coproc- 
essor output controller 32 and stored in an output reg- 
ister. Coprocessor output controller 32 references co- 
processor output control bus 23, determines a register 
to which data should be output, and controls the output 
of data from the register. The data that is output is sup- 
plied to processor 40 through coprocessor output bus 
24. Coprocessor output controller 32 further outputs the 
condition of the operation execution and the state of the 
data input overflow judged in coprocessor input control- 
ler 30 to coprocessor output bus 24. and offers informa- 
tion to pipeline controller 9 in processor 40 necessary 
for a stage progress halt, a cancellation of an instruction, 
or the like. 

Fig. 5 shows the Internal organization of coproces- 
sor 50. In this figure, a specific execution unit 21 3 within 



coprocessor execution unit 31 is the one that performs 
operations and the execution of operations is controlled 
by an execution controller 209. Specific execution unit 
213 performs operations specific to the coprocessor 

s (mainly dyadic operations) such as floating-point arith- 
metic, in addition to ordinary multiplication and division 
operations. Data on which operations are to be per- 
formed is supplied to specific execution unit 21 3 by two 
input registers SRQ and SRI . which are provided togeth- 

10 er with specific execution unit 21 3. Operation results are 
stored by an output register SR2 within coprocessor out- 
put controller 32. Output register SR2 outputs operation 
results to coprocessor output bus 24. Although the out- 
put register is shown here as a single unit, there may be 

15 a multiple number of units, in which case an output reg- 
ister decoder 21 4 selects a register that is to output data 
to the bus. In this embodiment, FlFOa 202 and FIFOb 
203 are provided in front of input registers SRO and SRI , 
respectively. They are connected directly to coproces- 

20 sor input bus 20, and can store data supplied from proc- 
essor 40. When processor 40 commands the coproces- 
sor to execute an operation, an input register decoder 
210 decides to which one of the input register SRO or 
SR1 data should be input. This decision is performed by 

25 input register decoder 210 looking at a coprocessor reg- 
ister number that was placed on coprocessor input con- 
trol bus 21 . A write command signal Wa or Wb is output 
from input register decoder 21 0 to the FIFO v-^tsere data 
is to be input, at which time the data that was placed on 

30 coprocessor input bus 20 is written to FIFOa 202 or Fl F- 
Ob 203. Specific execution unit 213 fetches data from 
FIFOa 202 and FlFOb 203 each time an operation being 
executed completes, and starts a new dyadic operation. 
The overall path in coprocessor 50 of the FIFO 

35 stage, input registers SRO and SRI, specific execution 
unit, and output register SR2 comprises a pipeline hav- 
ing a FIFO structure. Processing in this pipeline has a 
certain synchronous relationship with the stages in the 
pipeline at processor 40. 

40 In Fig. 5. counting circuits a 204 and b 205 are 
counters for incrementing count values when the afore- 
mentioned respective write command signals Wa and 
Wb are output. When output register decode 214 com- 
mands the output of data SR2, these counting circuits 

45 are notified o1 this through arithmetic controller 209 and 
decrement the count values. Therefore, these counting 
circuits indicate numbers of data items stored in FIFOa 
202 and FIFOb 203, respectively, and are referenced by 
optical circuits (not shown) for optical purpose. 

50 On the other hand, counting circuits ra 206 and rb 
207 are counters for incrementing count values accord- 
ing to write command signals Wra and Wrb from a res- 
ervation register decoder 211, respectively, and decre- 
menting count values under conditions similar to those 

55 for the aforementioned counting circuits a 204 and b 
205. In this embodiment, data writing to coprocessor 50 
is performed at stage W, which is the last stage of an 
instruction. Thus, even if there are empty areas in FIFOa 
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202 and FIFOb 203 at the A stage of the following in- 
struction, those areas may be filled in the W stage of the 
previous instruction. A reservation operation is therefore 
useful and necessary in order to correctly know the state 
of empty A stage areas. 5 

Write command signals Wra and Wrb are output 
from reservation register decoder 211 when an instruc- 
tion decoded by Instruction decoder 5 in processor 40 
announces in advance write operations for SRO and 
SRI . respectively. Therefore, the number of increments to 
in counting circuits ra 206 and rb 207 is, as a result, the 
same as that in counting circuits a 204 and b 205, but 
with earlier increment timing. The counter values of 
counting circuits ra 206 and rb 207 indicate sums of the . 
number of data items actually stores in FIFO a 202 and 
b 203 and the number of data items to be stored in the 
near future. If specific execution unit 213 always per- 
forms dyadic operations, one of either counting circuit a 
204 or b 205 Is sufficient. However, to implement oper- 
ations where a value of SRO is successively added to 20 
the operation result, it is necessary to separately include 
both counting circuits. 

Coprocessor output controller 32 further includes a 
pipeline process information generator 212. Pipeline 
process Information generator 212 outputs a coproces- -^5 
sor ready signal (hereinafter simply "ready signal") 220. 
A ready signal 220 is a signal used tor processor 40 to 
perform handshaking for processing with coprocessor 
50 and is output under the following conditions: 

30 

1 . Data transfer to SRO 

Taking reservations also into consideration, when 
the value of counting counter ra 206 is smaller than the 
number of data items that can be stored in FIFO a 202. 35 

2. Data transfer to SRI 

Similarly, when the value of counting circuit rb 207 
is smaller than the number of data items that can be 40 
stored in FIFOb 203. 

3. Data reading from SR2 

When an operation result is present in the output ^5 
register. 

When a ready signal is output, processor 40 issues 
any one of the above 1-3 instructions which has been 
made to wait. 

so 

[Operations] 

The operation according to the aforementioned 
structure will be described focusing on the operation re- 
lated to coprocessor 50. 

Fig. 6 shows a state of pipeline processing by the 
present apparatus. Similar to Fig. 2, four instructions are 
necessary to execute one multiplication operation, 



which is completed in two cycles. This process is differ- 
ent from that shown in Fig. 2 in that SRs 0 and 1 are 
used as operation registers in lieu of general-purpose 
registers because coprocessor 50 executes multiplica- 
tion operations in this apparatus. 

Two LD instructions in Fig. 6 may appear to be proc- 
essed similarly as In the case of Fig. 2. However, this 
apparatus is different from that related to Fig, 2 in that 
writing of operation data into SRs 0 and.1 can be started 
upon completion of M stage (time tO in Fi^. 6) since data 
which has been read from memory at M stage are intact 
input to coprocessor 50. 

Following LD instructions, a MUL instruction is ex- 
ecuted. As mentioned above, coprocessor 50 can start 
execution of multiplication operations concurrently at 
the start of M stage in this apparatus. Thus, A stage of 
MUL instruction can start without waiting till time tO by 
when all operation data become available. This allows 
respective l-A stages to start without a wait stage insert- 
ed. Execution of a multiplication operation is started at 
the start of M stage, and completes before two stages 
will have passed, namely, completion of W stage. Note 
that stages for pipeline processing are actually initiated 
late in this apparatus compared to the prior art^ but ini- 
tiation timing viewed in absolute time is not delayed. 
That is, execution of operations starts at the sixth cycle 
counted from I stage of the initial LD instruction in both 
cases of Figs. 2 and 6. and initiation times of the oper- 
ations are kept the same. 

A subsequent ST instruction will be processed as 
follows. Data read from SR2 is stored into address RIO 
in memory at M stage of ST instruction. M stage must 
wait until W stage of MUL instruction, namely, multipli- 
cation operation, is completed. In Fig. 6, M stage is proc- 
essed in the cycle following W stage (starting at time t1 ). 
The location of A stage is determined based on the lo- 
cation of M stage. As for ST instruction, it is always nec- 
essary to insert one wait stage before M stage (because 
two cycles are necessary for one multiplication opera- 
tion to complete). In this example, a wait cycle is inserted 
as a prolonged R stage. Which of the R or A stage is 
prolonged to create a wait stage depends on the appa- 
ratus design. In either case, an extra stage is created 
for waiting before a following XX instruction so that mul- 
tiplication processing is completed in five cycles in total. 

The above has been an overview of the embodi- 
ment. However, the improvements or modifications giv- 
en below are possible for the embodiment 

(1) The operations by coprocessor 50 are not limit- 
ed to such ordinary operations as monadic opera- 
tions, and product and sum operation. For example, 
operations apart from numerical operations may be 
assigned to coprocessor 50, such as control of com- 
munications with peripheral devices and communi- 
cation control between processors in a multiproces- 
sor apparatus. 

(2) A five-stage pipeline was described here. How- 
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ever, the pipeline may comprise any number of 
stages and any of the processing stages. 

(3) Although a RISC processor was given in this em- 
bodiment as an example, a CISC or other architec- 
ture may ot course be used instead. 

(4) Although coprocessor 50 is explicitly activated 
by a MUL instruction to execute operations in this 
embodiment, it may be activated implicitly. That is, 
since coprocessor 50 comprises dedicated input 
registers SRI and SRO apart from general-purpose 
register group 103, coprocessor 50 may automati- 
cally start execution of operations when these two 
registers receive necessary data. Fig. 7 shows a 
state of executing the same operation as that shown 
in Fig. 6 without using a MUL instruction. In this di- 
agram, multiplication operation itselt is executed 
between time tO-tl. and M stage of ST instruction 
starts at T1, similarly to the case of Fig. 6. In order 
to meet the above condition, two stages are insert- 
ed for waiting during execution of ST instruction, so 
that the entire operation execution processing is re- 
sultingly completed in five cycles, similarly to the 
case of Fig. 6. 

Claims 

1. A rYieti'iOd for executing instructions through pipe- 
line processing, wherein 

in addition to a main path for pipeline process- 
ing, a subpath is provided in a manner of de- 
touring around a part of the main path for as- 
suming a portion of the main path function, 
the subpath is connected such that processing 
therein flows in an opposite direction to that in 
the main path, and 

a process is entrusted from the main path to the 
subpath in a later stage of the pipeline process- 
ing executed in the main path than a processed 
result is received by the mainpath from the sub- 
path. 

2. A method as defined in claim 1 = wherein 

a process in the main path is executed by a 
processor, and 

a process in the subpath is executed by a co- 
processor 

3. A method as defined in claim 2, wherein 

hardware resources constituting the coproc- 
essor are reserved before a process is entrusted 
from the processor to the coprocessor so as to 
achieve smooth process entrustment. 

4. An apparatus for executing instructions through 
pipeline processing by dividing them into a plurality 



of stages, comprising: 

a plurality of stage process sections each re- 
sponsible for processing at each stage, consti- 
5 tuting a main path for pipeline processing; and 

an execution section provided on a subpath tor 
executing a predetermined operation through 
processing which flows in an opposite direction 
to that in the main path. 

10 

5. An apparatus as defined in claim 4, wherein 

a process in the main path is executed by a 
processor, and 

IS a process in the subpath is executed by a co- 

processor. 

6. An apparatus as defined in claim 5, wherein 

the coprocessor starts execution of opera- 
te? lions in response to an instruction which explicitly 
indicates a start o1 execution. 

7. An apparatus as defined in any one of claims 5 and 
6, wherein 

25 the coprocessor automatically starts opera- 

tion execution upon receipt of data input. 

8. An apparatus as defined in any one of claims 4-7, 
wherein 

30 a process is entrusted from the main path to 

the subpath at a later stage of the pipeline process- 
ing executed in the main path than a processed re- 
sult is received by the main path from the subpath 

35 9. An apparatus as defined in claim 8, wherein 

hardware resources constituting the coproc- 
essor are reserved before a process is entrusted 
from the processor to the coprocessor so as to 
achieve smooth entrustment. 

40 

10. An apparatus comprising a processor for executing 
instructions through pipeline processing and a co- 
processor for assuming a predetermined operation 
among processes by the processor, wherein 
45 the coprocessor is connected to the proces- 

sor in an opposite direction such that processing by 
the coprocessor flows in an opposite direction to 
that by the processor 

so 11. An apparatus as defined in claim 10, wherein 

the pipeline processing by the processor in- 
cludes a memory access stage as one of 
processing stages, 
55 the processor comprises a memory access 

section for controlling the memory access 
stage, and 

a data input unit of the coprocessor and a data 
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output unit thereof are connected to a data out- 
put unit of the memory access section and a 
data input unit thereof, respectively. 

12. An apparatus as defined in claim 10. wherein s 

the processor comprises an instruction fetch 
section, an instruction decode section, a gen- 
eral execution section, a memory access sec- 
tion, and a register write section, each respon- io 
sible for a corresponding stage of pipeline 
processing, and 

a data input unit of the coprocessor and a data 
output unit thereof are connected to a data out- 
put unit of the memory access section and a 
data input unit thereof, respectively via dedi- 
cated buses. 

13. An apparatus as defined in claim 12, wherein 

20 

input of data to the coprocessor is performed 
after the memory access section completes a 
process stage assigned thereto, and 
reading of data from the coprocessor is per- 
formed after the general execution section 
completes a process stage assigned thereto. 

1 4. An apparatus as defined in any one of claims Il- 
ls, wherein 

the coprocessor is initiated to execute the pre- 30 
determined operation during M (memory access) 
stage of an operation initiation instruction which is 
executed by the processor. 

1 5. An apparatus as defined in any one of claims 1 1 - 35 
14, wherein 

the coprocessor is initiated to execute the pre- 
determined operation during W (register write) 
stage of an operation initiation instruction which 
writes data to the coprocessor. 
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